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Abstract 

Background: The NTF2-like superfamily is a versatile group of protein domains sharing a common fold. The 
sequences of these domains are very diverse and they share no common sequence motif. These domains serve a 
range of different functions within the proteins in which they are found, including both catalytic and non-catalytic 
versions. Clues to the function of protein domains belonging to such a diverse superfamily can be gleaned from 
analysis of the proteins and organisms in which they are found. 

Results: Here we describe three protein domains of unknown function found mainly in bacteria: DUF3828, 
DUF3887 and DUF4878. Structures of representatives of each of these domains: BT_35 1 1 from Bacteroides 
thetaiotaomicron (strain VPI-5482) [PDB:3KZ7], Cj0202c from Campylobacter jejuni subsp. jejuni serotype 0:2 (strain 
NCTC 1 1 168) [PDB:3K7C], rumgna_01855) and RUMGNA_01855 from Ruminococcus gnavus (strain ATCC 29149) 
[PDB:4HYZ] have been solved by X-ray crystallography. All three domains are similar in structure and all belong to 
the NTF2-like superfamily. Although the function of these domains remains unknown at present, our analysis 
enables us to present a hypothesis concerning their role. 

Conclusions: Our analysis of these three protein domains suggests a potential non-catalytic ligand-binding role. 
This may regulate the activities of domains with which they are combined in the same polypeptide or via operonic 
linkages, such as signaling domains (e.g. serine/threonine protein kinase), peptidoglycan-processing hydrolases (e.g. 
NlpC/P60 peptidases) or nucleic acid binding domains (e.g. Zn-ribbons). 
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Background 

The NTF2-like superfamily is a large group of related 
proteins that share a common fold, first observed in the 
structure of the rat NTF2 (Nuclear Transport Factor 2) 
protein [1]. It is a versatile fold that can accommodate 
very different sequences and has no characteristic se- 
quence motif associated with it. The NTF2-like fold has 
a cone-like shape with a cavity inside and acts as a mo- 
lecular container that can be adapted to serve a broad 
range of different functions. 
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The NTF2-like proteins can be broadly defined into 
two functional categories: enzymatically active and non- 
enzymatically active proteins. The intracellular examples 
of this fold include most of the enzymatic functions 
associated with these proteins. These include SnoaL 
polyketide cyclase, scytalone dehydratase, limonene-1,2- 
epoxide hydrolase and 55-3-ketosteroid isomerase [2-5]. 
The extracellular NTF2-like proteins tend to be non- 
enzymatic and possess small molecule binding activity. 
Non- enzymatic members of this superfamily include 
NTF2 [1], a domain found at the C-terminus of cal- 
cium/calmodulin dependent protein kinase II which is 
responsible for the multimerization of these kinases [6] 
and Mbal, a protein which binds to ribosomes and may 
function as a receptor [7]. NTF2-like domains have been 
found in proteins involved in bacterial conjugation 
where a multiprotein complex, the type IV secretion 
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system, mediates transfer of plasmid DNA from a donor 
to a recipient bacterial cell [8-10]. More recently the 
non-catalytic NTF2-like domains have also been shown 
to function as immunity proteins in the bacterial poly- 
morphic toxin systems [11]. 

Release 27.0 of the Pfam database [12] includes 24 
different families as part of the NTF2 superfamily. Of 
these families, 21 have at least one representative where 
the three-dimensional structure has been deposited in 
the PDB. To date, the PDB contains at least 170 
structures with NTF2-like fold, including at least 27 
structures solved by the Joint Center for Structural 
Genomics (JCSG). Here we describe the first crystal 
structures of three Pfam families with NTF2-like 
folds: DUF3828 [PDB:3KZT] [Pfam:PF12883], DUF 
3887 [PDB:4HYZ] [Pfam: PF 13026] and DUF4878 [PDB: 
3K7C] [Pfam:PF12870]. 

Results and discussion 

Domain descriptions 

DUF3828 family [Pfam: PF12883] is annotated in Pfam 
as a domain of unknown function. It is present in 492 
different UniProtKB proteins from 451 different organ- 
isms. It is found exclusively in Gram-negative bacteria, 
with the vast majority of the species it occurs in belong- 
ing to the Enterobacteraceae family. [Pfam: PF12870] 
was previously annotated in Pfam as a lumazine-binding 
domain, however this has since been found to be incor- 
rect and so we have renamed this family as a domain of 
unknown function, DUF4878. This domain is present in 
650 different UniProtKB proteins from 571 different 
species. Like DUF3828, DUF4878 is a bacterial family, 
however it is found in a wider variety of bacterial 
species. It is found in both Gram-negative bacteria 
(including Proteobacteria and Bacteroidetes) and Gram- 
positive species (including Firmicutes and Actinobacteria). 
Finally, DUF3887 family [Pfam: PF13026] is another do- 
main of unknown function. This domain is present in 364 
different UniProtKB proteins from 262 different species. It 
is predominantly found in Firmicutes, but is also present in 
other phyla, including several Archaeal species. 

All three of these domains are of a similar length 
(around 100 amino acids). The N-terminus of DUF3828 
(Figure 1) contains a pair of conserved aromatic amino 
acids (phenylalanine and tyrosine). There is a conserved 
aspartic acid in the middle of the domain and close to 
this is a conserved glutamine. A conserved tryptophan is 
located near the C-terminus, closely followed by two 
conserved hydrophobic amino acids. DUF3887 (Figure 2) 
contains a highly conserved glycine in the middle of the 
domain and two conserved hydrophobic amino acids 
near the C-terminus. DUF4878 (Figure 3) contains a 
conserved glycine about 25 amino acids into the domain 
and a conserved tryptophan near the C-terminus. 



Domain architectures 

In Pfam, 224 of the 492 proteins (45%) containing 
DUF3828 also contain a DUF4878 domain at the C- 
terminus (Figure 4). Given the potential significance of 
this observation, we performed further investigation into 
the taxonomic distribution of these proteins. Using 
EvolView [14] we plotted a species tree containing mem- 
bers of the RP75 set of representative proteomes [15] 
which possess proteins containing DUF3828 and/or 
DUF4878 (Figure 5). Surprisingly, we found that only two 
of the 113 species in this tree possessed both domains, and 
therefore we conclude that the co-occurrence of these do- 
mains is not likely to be significant and is an artifact caused 
by the sequencing of a disproportionately large number of 
Escherichia coli strains compared to the other species these 
domains are found in. 

Besides the apparent co-occurrence of DUF3828 and 
DUF4878, the three DUFs also occur in several other ar- 
chitectures in Pfam. These can be split roughly into 
three categories: Architectures suggestive of communi- 
cation with extracellular ligand-sensing or intracellular 
signaling domains (Figure 4A), solo or multi-domain 
secreted and lipid-anchored architectures (Figure 4B) 
and fusions to C-terminal peptidase or other hydrolase 
domains (Figure 4C). 

In the first category (Figure 4A), the intracellular do- 
mains to which the extrinsic DUF is linked include pro- 
tein kinase domains and three distinct versions of zinc 
ribbons, which could potentially bind nucleic acids. 
These architectures are comparable to other signaling 
proteins where extracellular ligand domains are linked 
to intracellular signaling domains. This category also in- 
cludes fusions of the DUF with the sodium pump associ- 
ated oxaloacetate decarboxylase y chain (OAD gamma). 
In the third category (Figure 4C) we observed independent 
fusions to metallopeptidase (M23), DUF2324 (a transmem- 
brane domain which is a member of the Peptidase U clan), 
a beta-lactamase and an a/p hydrolase domain (Abhydro- 
lase). In all of these cases the DUF is present at the N- 
terminus and the hydrolase domain at the C-terminus. 
NTF2-like domains have been observed with a/p hydro- 
lases before: both SnoaL-like domain [Pfam:PF12680] and 
DUF4440 [Pfam:PF14534] co-occur with a/p hydrolase 
domains. 

Genomic context 

We studied the genomic context of proteins containing 
DUF3828, DUF3887 and DUF4878. In doing this we 
hoped to glean information about the possible function 
of these domains. As a result we uncovered a conserved 
association with DUF3828, which in diverse gammapro- 
teobacteria, betaprotebacteria and bacteroidetes is com- 
bined in an operon with a gene coding for a protein of 
the NlpC/P60 superfamily with a papain-like peptidase 
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fold (e.g. gi: 489959630 from Enterobacter cloacae) [17]. 
These domains are known to function as peptidases/am- 
idases in that cleave amide/peptide linkages in the bac- 
terial cell wall. Several of these proteins additionally 
contain further C-terminal domains such as EF-hands, 
metallopeptidase family M23 and glycohydrolases of the 
lysozyme [Pfam:PF00959] or the Chitinase Class I [Pfam: 
PF00182] families. Thus, domains point to catalytic 
activities that process both the peptide and glycosidic 
linkages in peptidoglycan. 

Structure description 

The crystal structure of a DUF3828 protein, BT_3511 
protein [UniProtKB:Q8AlZ7] from Bacteroides thetaio- 
taomicron (strain VPI-5482), was determined to 2.1 A 
resolution by MAD method and deposited to PDB as 
[PDB:3KZT]. The final model includes two molecules 
(residues 26-167), five 1,2-ethanediol, two sulfate ions 
and 118 water molecules in the asymmetric unit. The 
structure is mainly composed of three helices, one 
3i 0 helix and 4 beta strands. GlyO (that remained at the 



N-terminus after cleavage of the expression/purification 
tag), the region from Lys26 to Pro34 was disordered and 
not modeled. All the side chains were fully modeled be- 
cause of the complete electron density. The Matthews 
coefficient (V M ) is 2.05 A 3 Da" 1 and the estimated solv- 
ent content is 39.97%. The Ramachandran plot produced 
by MolProbity [18] shows that 96.9% of the residues are 
in favored regions, with no outliers. 

The crystal structure of a DUF4878 protein, Cj0202c 
protein [UniProtKB:Q0PBT7] from Campylobacter 
jejuni subsp. jejuni serotype 0:2 (strain NCTC 11168), 
was determined to 2.0 A resolution by MAD method 
and was deposited to PDB as [PDB:3K7C]. The final 
model includes four molecules (residues 1-113), one 
chloride ion, thirteen di hydroxyethyl ether (PEG), six 
triethylene glycol and 134 water molecules in the asym- 
metric unit. The structure is mainly composed of three 
helices and four beta strands. GlyO (which remained at 
the N-terminus after cleavage of the expression/purifica- 
tion tag), the region from Metl to Ser5 was disordered 
and not modeled. All the side chains were fully modeled 
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Figure 1 Sequence alignment of DUF3828. Conserved residues are highlighted in open red boxes. The secondary structure is shown above 
the alignment. The alignment was displayed using ESPript [13]. 
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Figure 2 Sequence alignment of DUF3887. Conserved residues are highlighted in open red boxes. The secondary structure is shown above 
the alignment. The alignment was displayed using ESPript [13]. 



because of the complete electron density. The Matthews 
coefficient (V M ;) is 2.3 A 3 Da" 1 and the estimated solvent 
content is 46.46%. The Ramachandran plot produced 
by MolProbity shows that 97.4% of the residues are in 
favoured regions, with no outliers. 

The crystal structure of a DUF3887 protein, the hypo- 
thetical protein Rumgna_01855 [UniProtKB:A7B2S7] 
from Ruminococcus gnavus (strain ATCC 29149), was 
determined to 2.25 A resolution by MAD method and 
was deposited to PDB as [PDB:4HYZ]. The final model 
includes two molecules (residues 36-149), six chloride 
ions, six sulfate ions, eight glycerol and 107 water mole- 
cules in the asymmetric unit. The structure is mainly 



composed of four helices, four turns, and five beta 
strands. Only GlyO (that remained at the N-terminus 
after cleavage of the expression/purification tag) was dis- 
ordered and not modeled. All the side chains were fully 
modeled because of the complete electron density. The 
Matthews coefficient (V M ) is 3.00 A 3 Da" 1 and the esti- 
mated solvent content is 58.98%. The Ramachandran 
plot produced by MolProbity shows that 99.6% of the res- 
idues are in favored regions, with no outliers. 

Comparison of these three structures showed that they 
are significantly similar to each other, especially for 
[PDB:3K7C] and [PDB:4HYZ]. FATCAT results showed 
that the structures of [PDB:3KZT] and [PDB:4HYZ] are 
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Figure 3 Sequence alignment of DUF4878. Conserved residues are highlighted in open red boxes. The secondary structure is shown above 
the alignment. The alignment was displayed using ESPript [13]. 



significantly similar with P-value of 1.53e- 03 and the 
structure alignment has 85 equivalent positions with an 
RMSD of 1.72 A; the structures of [PDB:3KZT] and 
[PDB:3K7C] are significantly similar with P-value of 
2.57e- 03 and the structure alignment has 95 equivalent 
positions with an RMSD of 2.53 A; the structures of 
[PDB:4HYZ] and [PDB:3K7C] are significantly similar 
with P-value of 1.55e- 06 and the structure alignment has 
95 equivalent positions with an RMSD of 2.99 A [19]. In 
all cases, FATCAT program detected flexibility in the 
structure, mostly limited to the relative position of heli- 
ces with respect to the central beta sheet. There is low 
sequence similarity between the three structures, and no 
positions are conserved in all three structures. 

The three structures ([PDB:3K7C], [PDB:3KZT] and 
[PDB:4HYZ]) all possess NTF2-like folds, despite being 
dissimilar in sequence (Figures 6 and 7, Table 1). A 
hydrophobic cavity with the potential for ligand-binding 
has been described in NTF2-like proteins before [20]. 
We used the MarkUs functional annotation server to lo- 
cate potential cavities within the three structures [21]. 
All three structures contain predicted cavities, but the 
position of these cavities is not conserved between the 
structures. The structure of [PDB:3K7C] differs from 
that of [PDB:3KZT] and [PDB:4HYZ] in that it lacks the 
edge strands in the beta-sheet but has a longer helix on 
the opposite side. It has a shallow cavity with positive 



electrostatic potential that contains a bound PEG mol- 
ecule in the crystal structure. Notably, in dimers seen in 
the crystal structure of [PDB:3K7C] the cavities of indi- 
vidual subunits combine in a contiguous groove that can 
accommodate a larger ligand than a conventional NTF- 
like fold can. [PDB:4HYZ] has a cavity of a similar size 
in a similar position with weakly positive electrostatic 
potential. Sequence conservation in the region of the 
cavities in [PDB:3K7C] and in [PDB:4HYZ] is poor. In 
contrast, the cavity found in [PDB:3KZT] has a negative 
electrostatic potential, this cavity includes two highly 
conserved aspartic acid residues (D-103 and D-110) 
which may be of significance. 

DALI [24] searches revealed that [PDB:3KZC] and 
[PDB:4HYZ] are more similar to each other than they 
are to most other members of the NTF2-like superfamily 
(Z-score 9.8), however [PDB:3KZT] is more distantly 
related. [PDB:3KZT] is most similar to [PDB:2UX0] 
(Z-score 8.8) which contains a Calcium/calmodulin 
dependent protein kinase II association domain [Pfam: 
PF08332], also a member of the NTF2-like superfamily. 
This domain functions as an oligomerisation domain 
[25]. It is also significantly similar to [PDB:2BHM] 
(Z-score 8.5), a member of the VirB8 family [Pfam: 
PF04335], a component of the type IV secretion system 
[10]. It is significantly less similar to [PDB:3K7C] (max- 
imum Z-score of 6.4 when compared to chain A), and 
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Figure 4 Domain architectures of selected sequences containing NTF2-like domains. Domain architectures were predicted by Pfam [12]. 
Signal peptides and transmembrane regions were predicted using Phobius [16]. The NTF2-liks DUFs are shown in blue, zinc ribbons in green, 
TPRs yellow, peptidases and other hydrolases in pink, other DUFs in bright green, and signal peptides (S), lipoboxes (SL) and transmembrane 
regions (TM) in yellow. Panel A shows architectures with extracellular ligand-sensing or intracellular signaling domains. Panel B shows secreted 
and lipid-anchored architectures. Panel C shows architectures including hydrolase domains. 



[PDB:4HYZ] (maximum Z-score of 6.8 when compared 
to chain B). [PDB:4HYZ] and [PDB:3K7C] are most 
similar to members of the SnoAL_3 family [Pfam: 
PF13474] including [PDB:3GWR] (Z-score 10.1 when 
compared to [PDB:3K7C]), and the SnoAL_2 family 
[Pfam:PF12860] including [PDB:3D9R] (Z-score 9.5 
when compared to [PDB:3K7C]). 



Potential function 

NTF2-like domains include both catalytic and non- 
catalytic versions that tend to bind small molecules 
using a common substrate-binding pocket. Our analysis 
of these DUFs did not reveal conserved polar residues 
suggestive of catalytic activity in DUF3887 or DUF4878. 
DUF3828 contains a conserved aspartic acid, which 
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Figure 5 Species distribution of DUF3828 and DUF4878. Species tree was plotted using EvolView [14]. Red dots denote the presence of 
DUF3828 in a protein from the named species; blue dots denote the presence of DUF4878. Green stars indicate species that both domains are 
present in the species. 



could point to a catalytic function. NTF2-like domains 
which are enzymatic tend to occur in an intracellular 
context, however prediction of subcellular localization 
using Phobius [16] revealed the consistent presence of 
either N-terminal secretory signals or lipoboxes with a 
conserved cysteine which helps anchor the protein to 
the membrane. Those proteins that lack either of these 
features have transmembrane regions with predicted 
membrane topologies suggestive of an extracellular 



location for the DUF (Figure 4). Together these observa- 
tions suggested that these three DUFs are novel NTF2- 
like domains that are likely to be extracellular domains 
that recognize a small molecule ligand via their binding 
pocket. 

Further evidence for such a function is offered by the 
domain architectures of these proteins (Figure 4). Where 
the DUF is found at the N-terminus of OAD y chain do- 
main the sensing of a ligand could help allosterically 
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4HYZ 3K7C 3KZT 

Figure 6 Structures of representatives of DUF3828, DUF3887, and DUF4878. [PDB:4HYZ] is a member of DUF3887, [PDB:3K7q is a member of 
DUF4878 and [PDB:3K^H is a member of DUF3828. Structures were aligned with POSA [22] and hydrophobic surface plots generated in Chimera [23]. 



regulate sodium flux [26]. Where the DUF is found at 
the N-terminus of a protein containing a C-terminal 
peptidase or other hydrolase domain, it is conceivable 
that the sensing of a ligand by the N-terminal DUF 
regulates the catalytic domain. Similar domain archi- 
tecture associations were also observed for DUF4352, 




Figure 7 Structural superimposition of DUF3887 (blue), 
DUF4878 (yellow), and DUF3828 (magenta). 



which occurs fused to DUF4878 in certain contexts: 
DUF4352 is also linked to metallopeptidase (M56), 
protein kinase and TPR repeats and is also associated 
with lipid attachment signal or signal peptides or 
transmembrane regions. Hence it is possible that 
the two domains perform comparable functions and 
cooperate in recognition of extracellular ligands on oc- 
casions. The versions combined in operons with the 
NlpC/P60 like peptidases/amidases might potentially 
regulate the export and/or the activity of these 
peptidoglycan hydrolyzing proteins that could have a 
potentially suicidal effect on the cell. Thus, they could 
play a role in regulating peptidoglycan remodeling. 



Conclusions 

Here we present a comparison of first crystal structures 
of three DUFs belonging to the NTF2-like superfamily. 
This work expands our structural knowledge of the 



Table 1 Percentage identity of the three proteins for 
which structure has been determined, calculated using 
DALI [24] 
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sequence diverse NTF2 superfamily. Analysis of the 
three-dimensional structure, sequence and associated 
domains can provide clues about the likely function of a 
protein domain. We present a detailed analysis of these 
three domains, which suggests that they may play a role 
in binding to small molecule ligands. 

Methods 

Sequence and gene context analysis 

Data for families DUF3828 and DUF4878 are taken from 
Pfam release 27.0 [12]. The definition of DUF3887 has 
been improved during the course of this work and the 
updated version will form a part of Pfam release 28.0. 
Signal peptides and transmembrane domains were pre- 
dicted using Phobius [16]. A phylogenetic tree was con- 
structed from proteomes in representative proteomes 
RP75 [15] using the NCBI taxonomy common tree [27]. 
This was annotated and displayed using EvolView [14]. 

With the DUF genes as anchors, the gene neighbour- 
hood was also comprehensively analyzed using a custom 
Perl script. This script uses either the PTT file (down- 
loadable from the NCBI ftp site) or the Genbank file in 
the case of whole genome shot gun sequences to extract 
the neighbors of a given query gene. The protein 
sequences of all neighbors were clustered using the 
BLASTCLUST program (ftp://ftp.ncbi.nih.gov/blast/doc- 
uments/blastclust.html) to identify related sequences in 
gene neighbourhoods. Each cluster of homologous pro- 
teins were then assigned an annotation based on the do- 
main architecture or conserved shared domain which 
were detected using Pfam models and in-house profiles 
run using RPS -BLAST [28]. This allowed an initial an- 
notation of gene neighbourhoods and their grouping 
based on conservation of neighborhood associations. In 
further analysis care was taken to ensure that genes are 
unidirectional on the same strand of DNA and shared a 
putative common promoter to be counted as a single 
operon. If they were head to head on opposite strands 
they were examined for potential bidirection promoter 
sharing patterns. 

Structure determination 

Protein purification and crystallization was performed by 
the JCSG crystallomics core [29-31]. All X-ray diffrac- 
tion data were collected at the Stanford Synchrotron Ra- 
diation Lightsource (SSRL) on beamline 11-1. Data sets 
were collected at 100 K using a Rayonix MX-325 CCD 
detector. X-ray diffraction data were collected from a 
single crystal at wavelengths corresponding to the inflec- 
tion (k i), high energy remote (X 2 )> and peak (A 3 ) 
[PDB:3K7C]; the peak(X !), inflection (X 2 ), and high 
energy remote (X 3 ) [PDB: 4HYZ]; or the inflection 
(X i) and high energy remote (X 2 ) [PDB:3KZT], of a 
multi-wavelength or a two-wavelength selenium multi- 



wavelength anomalous diffraction (MAD). The data 
were integrated and scaled using the XDS and 
XSCALE programs respectively [32,33] [PDB:3K7C] 
or the MOSFLM [34] and SCALA [35] programs 
[PDB:3KZT][PDB:4HYZ]. Data statistics are summa- 
rized in Additional file 1: Tables S1-S3. The selenium 
substructures for the three proteins were solved with 
SHELXD [36] and the MAD phases were refined with 
autoSHARP [37]. Iterative automated model building 
was performed with RESOLVE [38] at a resolution of 
2.00 A [PDB:3K7C] or Arp/Warp [39] at a resolution 
of 2.15 A [PDB:3KZT] or with Buccaneer [40,41] at 
a resolution of 2.25 A [PDB:4HYZ] from density- 
modified electron density. Model completion was 
performed using the interactive computer-graphics 
program COOT [42] and MAD-phase-restrained re- 
finement was accomplished using the program REFMAC 
ver 5.5.0102 [PDB:3K7C], ver 5.5.0053 [PDB:3KZT] [43] or 
BUSTER ver 2.10.0 [44] [PDB:4HYZ]. 

Structure validation and deposition 

The quality of the crystal structure was analyzed using 
the JCSG Quality Control Server [45]. This server 
verifies: the stereochemical quality of the model 
using AutoDepInputTool, MolProbity and WHATIF 
5.0 [18,46,47]; agreement between the atomic model 
and the data using SFcheck 4.0 and RESOLVE [38,48]; 
the protein sequence using CLUSTALW [49]; atom oc- 
cupancies using MOLEMAN2.0 [50]; and consistency 
of NCS pairs. It also evaluates differences in Rcryst/ 
Rfree, expected Rfree/Rcryst, and maximum/minimum 
B-values by parsing the refinement log-file and PDB 
header. Protein quaternary structure analysis used the 
EBI PISA server [51]. Atomic coordinates and experi- 
mental structure factors have been deposited in the 
PDB and are accessible under the codes [PDB:3KZT], 
[PDB:3K7C] and [PDB:4HYZ]. Electrostatic potential 
and cavity prediction was performed using the MarkUs 
functional annotation server [21]. 

Availabilty of supporting data 

The data sets supporting the results of this article are 
included within the article (and its Additional file 1: 
Tables S1-S3). 

Additional file 



Additional file 1: Table SI. Data collection and refinement statistics 
(PDB 3kzt). Table S2. Data collection and refinement statistics (PDB 3k7c). 
Table S3. Data collection and refinement statistics (PDB 4hyz). 
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